---
id: pgvector
title: PGVector
sidebar_label: PGVector
---

## Quick Summary

PGVector is an open-source PostgreSQL extension that enables **semantic search** and similarity-based retrieval directly within PostgreSQL, making it a scalable, SQL-native solution for LLM applications and RAG pipelines. Learn more about PGVector [here](https://github.com/pgvector/pgvector).

When building your **PGVector** retriever, you'll have to define hyperparameters like `LIMIT` and the `embedding model` to encode your text chunks. DeepEval can help you optimize these parameters by evaluating how well your PGVector retriever does under different hyperparameter combinations:

:::info
To get started, install PGVector and the PostgreSQL client using the following command:

```
pip install psycopg2 pgvector
```

:::

## Setup PGVector

To interact with a PostgreSQL database from Python, we'll use the `psycopg2` library, which provides a low-level database adapter following the PostgreSQL client-server protocol, to connect to our database. This connection allows us to execute SQL queries, fetch results, and manage transactions.

```python
import psycopg2
import os

# Connect to PostgreSQL database
conn = psycopg2.connect(
    dbname="your_database",
    user="your_user",
    password=os.getenv("PG_PASSWORD"),  # Set in environment variable
    host="localhost",
    port="5432"
)
cursor = conn.cursor()
```

Next, you'll need to create a table to store `text` chunks along with their corresponding embedding `vectors`. To enable vector operations, you'll need to activate the `pgvector` extension.

```python
# Enable the pgvector extension (only needed once)
cursor.execute("CREATE EXTENSION IF NOT EXISTS vector;")

# Define table schema for text and embeddings
cursor.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        text TEXT,
        embedding vector(384)  -- Defines a 384-dimension vector
    );
""")
conn.commit()
```

Finally, you'll need to convert your document chunks into vectors using an embedding model and store them in PostgreSQL. We'll use `all-MiniLM-L6-v2` from `sentence-transformers` to generate embeddings and insert them into the `documents` table.

```python
# Load an embedding model
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("all-MiniLM-L6-v2")

# Example document chunks
document_chunks = [
    "PGVector brings vector search to PostgreSQL.",
    "RAG improves AI-generated responses with retrieved context.",
    "Vector search enables high-precision semantic retrieval.",
    ...
]

# Store chunks with embeddings in PGVector
for chunk in document_chunks:
    embedding = model.encode(chunk).tolist()  # Convert text to vector
    cursor.execute(
        "INSERT INTO documents (text, embedding) VALUES (%s, %s);",
        (chunk, embedding)
    )

conn.commit()
```

PGVector functions as the **retrieval engine** in your RAG pipeline, efficiently fetching relevant document chunks to provide your LLM generator with grounded context. The diagram below illustrates how PGVector integrates into your RAG pipeline.

<div
  style={{
    display: "flex",
    alignItems: "center",
    justifyContent: "center",
    flexDirection: "column",
  }}
>
  <img
    src="https://deepeval-docs.s3.us-east-1.amazonaws.com/pgvector.png"
    style={{
      margin: "10px",
      height: "auto",
      maxHeight: "800px",
    }}
  />
  <div
    style={{
      fontSize: "13px",
    }}
  >
    Source: HexaCluster
  </div>
</div>

## Evaluating PGVector Retrieval

Evaluating your PGVector retriever involves **two key steps**. First, you need to generate a test case by preparing an `input` query along with the expected LLM response. This `input` is then processed through your RAG pipeline to produce an `LLMTestCase`, which includes the query, actual output, expected output, and retrieved context.

Once the test case is created, the next step is to assess retrieval performance using a selection of evaluation metrics designed to measure the precision, recall, and relevance of the retrieved context.

### Preparing your Test Case

Since retrieving relevant `retrieval_context` from your PGVector table is the first step in generating a response from your RAG pipeline, you need to perform a similarity search based on the `input` query. The function below encodes the `input` query into an embedding and retrieves the `top-K` (or `LIMIT`) most similar document chunks using cosine similarity.

```python
...

def search(query, top_k=3):
    query_embedding = model.encode(query).tolist()

    cursor.execute("""
        SELECT text FROM documents
        ORDER BY embedding <-> %s  -- Use <-> for cosine similarity
        LIMIT %s;
    """, (query_embedding, top_k))

    return [row[0] for row in cursor.fetchall()]

query = "How does PGVector work?"
retrieval_context = search(query)
```

Next, we'll insert the `retrieval_context` retrieved from the vector database into our prompt template to generate an LLM response, referred to as `actual_output`. This step finalizes the required parameters needed to construct an `LLMTestCase`.

```python
from deepeval.test_case import LLMTestCase

...

prompt = """
Answer the user question based on the supporting context

User Question:
{input}

Supporting Context:
{retrieval_context}
"""

actual_output = generate(prompt) # hypothetical function, replace with your own LLM
print(actual_output)
# PGVector enables efficient vector search within PostgreSQL for AI applications.

test_case = LLMTestCase(
    input=input,
    actual_output=actual_output,
    retrieval_context=retrieval_context,
    expected_output="PGVector is an extension that brings efficient vector search capabilities to PostgreSQL.",
)
```

### Running Evaluations

Before evaluating the `LLMTestCase`, we need to define `deepeval` metrics that measure the effectiveness of the PGVector retriever. Key retrieval metrics include **contextual recall**, **contextual precision**, and **contextual relevancy**, which assesses how well the retrieved `retrieval_context`.

:::info
You can learn more about these contextual metrics and why they're relevant to retriever evaluation in this [guide](/guides/guides-rag-evaluation).
:::

```python
from deepeval import evaluate
from deepeval.metrics import (
    ContextualRecallMetric,
    ContextualPrecisionMetric,
    ContextualRelevancyMetric,
)

...

contextual_recall = ContextualRecallMetric(),
contextual_precision = ContextualPrecisionMetric()
contextual_relevancy = ContextualRelevancyMetric()

evaluate(
    [test_case],
    metrics=[contextual_recall, contextual_precision, contextual_relevancy]
)
```

## Improving PGVector Retrieval

After running multiple test cases, let's assume that the **Contextual Precision** score is lower than expected. This suggests that while our retriever is fetching relevant contexts, some of them may not be the best match for the query, introducing noise into the response.

### Key Findings

| Query                                    | Contextual Precision Score | Contextual Recall Score |
| ---------------------------------------- | -------------------------- | ----------------------- |
| "How does PGVector store embeddings?"    | 0.42                       | 0.91                    |
| "Explain PGVector’s similarity search."  | 0.38                       | 0.87                    |
| "What makes PGVector efficient for RAG?" | 0.40                       | 0.85                    |

### Addressing Low Precision

Since **precision** measures how well the retrieved contexts align with the query, a lower score often means that some retrieved results are not as relevant as they should be. Possible improvements include:

- **Using a More Domain-Specific Embedding Model**  
  If your use case involves technical documentation, a general-purpose model like `all-MiniLM-L6-v2` may not be ideal. Consider testing models such as:

  - `BAAI/bge-small-en` for better retrieval ranking.
  - `sentence-transformers/msmarco-distilbert-base-v4` for dense passage retrieval.
  - `nomic-ai/nomic-embed-text-v1` for handling longer text chunks.

- **Optimizing Retrieval Parameters**

  - Adjust `LIMIT` in your retrieval query to control the number of retrieved results.

### Next Steps

After refining your retrieval strategy—whether by adjusting embedding models or tuning retrieval parameters—it's crucial to generate new test cases and reassess performance. Focus on **Contextual Precision**, as improvements here indicate a more accurate and relevant retrieval process.

:::info
For systematic retrieval evaluation and embedding model comparisons, use [Confident AI](https://www.confident-ai.com/).
:::
